Optimal design

New samples can be examined according to the goal of the investigator.

Maximum information

The differential entropy of the posterior is defined as:

$$ H(y|\mathbf{x}) = E[I(y|\mathbf{x})] = \int_X{P(y|\mathbf{x})\cdot \log(P(y|\mathbf{x})) d\mathbf{x}} $$

In order to determine which point to examine next, we want to minimize the expected entropy of the posterior distribution after examining that point. Simply picking the highest point is computationally expensive and will give us poor results so instead we sample stochastically.

KL divergence \cite{box1967discrimination} p. 62 is maybe the most useful metric here.

Difference in information old posterior vs. new posterior.


In [25]:
import scipy.stats

print(scipy.stats.norm.entropy(0,1))
print(scipy.stats.norm.entropy(0,2))

print(scipy.stats.uniform.entropy(0,0.5))
print(scipy.stats.uniform.entropy(0,1))
print(scipy.stats.uniform.entropy(0,2))


1.4189385332046727
2.112085713764618
-0.6931471805599453
0.0
0.6931471805599453

Maximum information over a set of models

Boxoptimizes for information gain over the class of candidate models instead. Computationally simpler, less appropriate for open-ended problems.

Long run optimization

Thompson sampling.

Single-best optimization